We are avid music lovers. We listen to both critically acclaimed music and trendy pop songs. In daily life, we found some albums have great reviews but few people have heard of them; while some hot TikTok songs are spurned by most people. Of course, a lot of good music is both popular and highly rated.
Therefore, we want to explore the relationship between the review and stream. Meanwhile, we also want to give some answers to these two questions:
Bearing these questions in mind, we explore and visualize the data from Metacritic (for critic reviews), Rate Your Music (for user ratings), and Spotify (for popularity).
We are aware that there are many existent projects focusing on analyzing music-related data; for example, some projects purely explore and present the Spotify data. However, our project focuses on three different sources, and we mainly use the popularity information provided by Spotify. Additionally, we emphasize on the relationship between different sources and different aspects of the music.
Now please join us on this wild music EDAV journey!
Placeholder
Placeholder
In the part of data exploratory analysis, we will first introduce the respective data of Metacritic Ratings and Rate Your Music Ratings in turn, then compare and analyze ratings from Metacritic and RYM. Next we will introduce Spotify Popularity and Tracks Variables, and finally we will cross-analyze data from these three data source.
Note: For reading convenience, the following plots whose feature are filled in purple denotes that the main data is from Metacritic, blue for RYM and green for Spotify. Orange represents cross-analysis. :)
What’s the distribution of critic ratings of Albums that were released between 2018 and 2022 on Metacritic look like? How does it change over time? The following graph tells us the answer.
First, from the marginal density graph in the right part of the graph, we can tell that the distribution of critic ratings is a little bit left skewed. Most albums’ ratings are within the range [70,90]. There are, however, a few albums with really low ratings. For example, one album’s rating on Metacritic doesn’t even reach 40 out of 100. Also, there are some extraordinary albums that almost get a full score. Now we print the album with the highest score and the one with the lowest score.
## [1] "Fetch the Bolt Cutters by Fiona Apple with score 98"
## [1] "DUMMY BOY by 6ix9ine with score 38"
Second, the top density curve of the graph tells us how the number of ratings of albums on Metacritic changes over time. We can see that the number of ratings is steadily decrease over time. Since every album only has one score on Metacritic, it could be that the phenomenon that the number of albums released is decreased over time leads to the decline.
Lastly, the purple dot in the graph shows the album ratings on Metacritic and corresponding release time. At first glance, there is no obvious pattern or trend in it: points are roughly evenly distributed, and there is no sudden change along the release date. However, there is a small interesting finding: it seems that there exists invisible line that separate the points between different year. It is weird at first because we didn’t make any restriction on the date. However, after second thought, it is quite reasonable, artists also celebrate Christmas and New Year! So at the end of each year, there are basically no new albums released.
The three pattern also exist in the below plot for RYM ratings. However, the variance of user ratings are much larger than critic reviews. Why? The lowest rating in Metacritic is around 40 while in RYM, it is below 1. Considering that the critic ratings range from 0 to 100 and user ratings range from 0 to 5, 40 in 100 is a much tender rating compared to 1 in 5. We can also observe that the number of score is decline over time and also there are invisible lines between years.
Next, we will analysis relationship between the number of users who rate the album and the album ratings through an interactive scatter plot. You can hover over the point and it will show you the name of the album you point at. You can also select a rectangle to see the partial data and dive deeper.
The point are colored according to the Metacritic score, the deeper of the color means the higher the score is. Along the y-axis, we can see that roughly the color of the points turns deeper, which means overall the RYM ratings and Metacritic Score have positive relationships.(We will discuss the relationship later)
Regarding the relationship between the number of users who rate the album and the album ratings, we have following findings: - There is no albums with high votes and low ratings. It is reasonable because people have more incentive to rate an album if they think it is great, otherwise they will just omit it. There is an exception on the album “Jesus Is King” by Kanye West with 27k votes and 2.38 rating. Maybe we should all listen to this album thouroughly to find out why:) - For albums with more than a few votes, the average rating increases with the number of votes. - No albums in the area where represents high votes and max ratings - There are points who are lower than expected ratings and should be seen as outliers. (e.g. “Jesus Is King”) - Albums with low votes have relatively full range over y-axis.
In this part we will compare between metacritic ratings and RYM ratings.
The following boxplot shows the distribution of Metacritic ratings and RYM ratings in each year. We have following findings:
We show the top 15 rated singers/artists in Metacritic and RYM in the following graph. Unfortunately, no singer is in both the top 15 of Metacritic ratings and the top 15 of RYM ratings.T his shows how different the tastes of professional music critics are from those of the public.
Does a high Metscritic score indicate a high RYM rating? Roughly YES!
The following scatter plot with smooth line shows the relationship between RYM Score and Metacritic score. The line has a positive slope with relative smally confidence interval(grey area around the line).
However, the answer is not so sure in the case that one of the score is extreme low. We can see that in the high subregion, the points are very dense. But in the lower left corner of the graph, the distribution of points begins to scatter. This shows that if the Metacritic score is particularly low, we are no longer very confident in our answer. That is, a low score of Metacritic no longer indicates a low score of RYM. Vice versa.
The following graph have 5 facets each of which contains scatter plot with smooth line in different year. Overall the patterns are same with the above graph. However, there are subtle different in the lower left corner and higher right corner between different year. For example, we have more confidence with our answer to the question “Does a high Metscritic score indicate a high RYM rating?” in year 2019 and 2021 for extremely low rating cases(lower left corner).
The below scatter plot with marginal density plot shows how Spotify popularity of album distributed and how it changed over time. Different to ratings, the distribution of popularity is a bit right skewed rather than left skewed, which tells us that most album have relatively low popularity. It is worth noting that the popularity here means the maximum popularity of songs in a album. We will discuss the commons and difference of the distribution between maximum popularity, minimum popularity, average popularity and median popularity later.
The graph below shows the distribution of the box plot of min, max, average and median popularity in different years. We can see that the growth trend and distribution changes are similar for these four graphs. We also find that the popularity increases with the year. This makes sense because the popularity provided by Spotify is the station’s song popularity at a certain date, and people usually play albums just released more frequently rather than those released years ago.
Next, we draw a parallel coordinate plot on different representative
methods of an album(avg, max, min and median). The major trend is that
all these four variables are positive related. We can conclude this by
the nearly horizontal lines between them. However, there do exist
exception, we can also find some lines, who has extreme low average
popularity, high maximum popularity, low minimum popularity and high
median popularity. This means that, there ia a small fraction of albums,
in which the variance of popularity of different tracks/songs can be
very large.
The analysis so far has been at the album level; in this subsection, we will focus on the song level.
Have you ever wonder what are the factors that contribute to the popularity of a song? The common practice is to pick two variables that you think are correlated and graph or analyze them. However with Spotify’s dataset of over 15 variables, a two-by-two analysis would take a lot of time and effort and be extremely inefficient.
To explore this, we plotted a heat map of the correlations between the variables in the Spotify track dataset. Color blue represents positive correlations and red represents negative correlations. Finding the rows and columns correlated with popularity, we can find that popularity is very positively correlated with the danceability of the song and negatively correlated with the instrumentalness of the song. Is this really the case?
For further analysis, we plotted the scatter plot of popularity vs. instrumentalness and the scatter plot of popularity vs. danceability.
The plot on the left shows popularity vs. instrumentalness. We can see that the points are mostly concentrated at the ends of the x-axis, and there is a tendency for the popularity to decrease as the instrumentalness rises. In addition, we also find that almost all points with high popularity are concentrated in the region where the instrumentalness is particularly small (upper left corner).
Do albums with high ratings also have high popularity?
To answer this question, we drew scatter plots of Spotify popularity and metacritic rating as well as scatter plots of Spotify popularity and Rate Your Music rating. We can see that the distribution of these two plots is about the same.
The following two graphs show the relationship between Spotify popularity and metacritic rating as well as the relationship between Spotify popularity and Rate Your Music rating in different years. Although in general, the trends and distribution of these graphs are similar to the ones we just mentioned, in some years, there are some edge cases that lead to subtle changes in the situation. For example, in the second graph below, which shows Spotify popularity and Rate Your Music rating, in 2021, albums with higher ratings no longer enjoy higher popularity, which is a curious phenomenon. A closer look reveals that there is an outlier that causes the fitted curve to shift downward.
In order to better analyze the relationship between popularity and rating, and considering that these are all continuous variables, we plotted the Parallel coordinate plot and properly changed the order of the variables for better observation.
Overall, the relationship between the five variables looks like a capital M, which means that the majority of albums fall within a range where high ratings and low popularity are associated overall. This corresponds to the graph we just presented, where the center of the graph is in the lower right corner. Of course, there are some exceptions in the Parallel coordinate plot, such as the inverted M. These are albums with low ratings but very high popularity.
Below is the Parallel coordinate plot by year, and we can see that in addition to showing the overall trend, the exceptions, i.e., the inverted M shape, become more and more pronounced as the year increases.
We can see that in 2018, the number of inverted M shapes lines is small and they do not stand out, meaning that the minimum popularity is not high. However, in 2022, there are more albums that have this very low rating, yet are extremely popular, both in terms of minimum, maximum and median popularity.
Perhaps this is related to the gradual popularity of short videos and their viral spread of catchphrases. In any case, it seems to show that our time is being occupied by entertainment little by little.
This section allow users to intractively explore the relationship between the review and popularity of albums. We selected top 100 albums on Metacritic in 2022. We also present the relationship between the review and other aspects like danceability and energy. The possibilities are unlimited.
On the left half of the page, we rank the albums according to the Metacritic chart. The right half of the page is interactive. First, it present the average popularity of the album on Spotify in a way that resembles a Cleveland dot plot. As we can discover, locally, there is no strong association between the review rank and the average popularity. However, when we broaden our horizon to the whole chart, we can see a clear trend that albums with higher review rank are more popular.
Moreover, users can click on the circle representing an album. Then the circle will expand to a popularity range of the tracks in that album. This provides more detailed information about the popularity of the album. Also, when users hover on the circle representing a track, it will show the track name.
With this intractive gragh, users can easily figure out the review rank, average popularity of an album, and popularity of specific tracks. These dimensions cannot be easily presented in a static graph without clustering the graph.
We also provide two other aspects: danceability and energy. After clicking on the buttons on the top, the graph will transition to present another aspect.
Placeholder